31 research outputs found

    Heterogeneous hierarchical workflow composition

    Get PDF
    Workflow systems promise scientists an automated end-to-end path from hypothesis to discovery. However, expecting any single workflow system to deliver such a wide range of capabilities is impractical. A more practical solution is to compose the end-to-end workflow from more than one system. With this goal in mind, the integration of task-based and in situ workflows is explored, where the result is a hierarchical heterogeneous workflow composed of subworkflows, with different levels of the hierarchy using different programming, execution, and data models. Materials science use cases demonstrate the advantages of such heterogeneous hierarchical workflow composition.This work is a collaboration between Argonne National Laboratory and the Barcelona Supercomputing Center within the Joint Laboratory for Extreme-Scale Computing. This research is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under contract number DE-AC02- 06CH11357, program manager Laura Biven, and by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    On the energy footprint of I/O management in Exascale HPC systems

    Get PDF
    International audienceThe advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. However, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. To do so, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid’5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. Our proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches

    Enabling Fast Failure Recovery in Shared Hadoop Clusters: Towards Failure-Aware Scheduling

    Get PDF
    International audienceHadoop emerged as the de facto state-of-the-art system for MapReduce-based data analytics. The reliability of Hadoop systems depends in part on how well they handle failures. Currently, Hadoop handles machine failures by re-executing all the tasks of the failed machines (i.e., executing recovery tasks). Unfortunately, this elegant solution is entirely entrusted to the core of Hadoop and hidden from Hadoop schedulers. The unawareness of failures therefore may prevent Hadoop schedulers from operating correctly towards meeting their objectives (e.g., fairness, job priority) and can significantly impact the performance of MapReduce applications. This paper presents Chronos, a failure-aware scheduling strategy that enables an early yet smart action for fast failure recovery while still operating within a specific scheduler objective. Upon failure detection, rather than waiting an uncertain amount of time to get resources for recovery tasks, Chronos leverages a lightweight preemption technique to carefully allocate these resources. In addition, Chronos considers data locality when scheduling recovery tasks to further improve the performance. We demonstrate the utility of Chronos by combining it with Fifo and Fair schedulers. The experimental results show that Chronos recovers to a correct scheduling behavior within a couple of seconds only and reduces the job completion times by up to 55% compared to state-of-the-art schedulers

    Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

    Get PDF
    Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven

    Spark-DIY: A framework for interoperable Spark Operations with high performance Block-Based Data Models

    Get PDF
    This work was partially funded by the Spanish Ministry of Economy, Industry and Competitiveness under the grant TIN2016-79637-P ”Towards Unification of HPC and Big Data Paradigms”; the Spanish Ministry of Education under the FPU15/00422 Training Program for Academic and Teaching Staff Grant; the Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357; and by DOE with agreement No. DE-DC000122495, program manager Laura Biven

    A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems

    Get PDF
    International audienceThe advent of fast, unprecedentedly scalable, yet energy-hungry exascale supercomputers poses a major challenge consisting in sustaining a high performance per watt ratio. While much recent work has explored new approaches to I/O management, aiming to reduce the I/O performance bottle-neck exhibited by HPC applications (and hence to improve application performance), there is comparatively little work investigating the impact of I/O management approaches on energy consumption. In this work, we explore how much energy a supercom-puter consumes while running scientific simulations when adopting various I/O management approaches. We closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. We im-plement the three approaches within the Damaris I/O mid-dleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-peta-flop/s supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid'5000 platform highlight the differences between these three ap-proaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption

    Chronos: Failure-Aware Scheduling in Shared Hadoop Clusters

    Get PDF
    International audienceHadoop emerged as the de facto state-of-the-art system for MapReduce-based data analytics. The reliability of Hadoop systems depends in part on how well they handle failures. Currently, Hadoop handles machine failures by re-executing all the tasks of the failed machines (i.e., executing recovery tasks). Unfortunately, this elegant solution is entirely entrusted to the core of Hadoop and hidden from Hadoop schedulers. The unawareness of failures therefore may prevent Hadoop schedulers from operating correctly towards meeting their objectives (e.g., fairness, job priority) and can significantly impact the performance of MapReduce applications. This paper presents Chronos, a failure-aware scheduling strategy that enables an early yet smart action for fast failure recovery while still operating within a specific scheduler objective. Upon failure detection, rather than waiting an uncertain amount of time to get resources for recovery tasks, Chronos leverages a lightweight preemption technique to carefully allocate these resources. In addition, Chronos considers data locality when scheduling recovery tasks to further improve the performance. We demonstrate the utility of Chronos by combining it with Fifo and Fair schedulers. The experimental results show that Chronos recovers to a correct scheduling behavior within a couple of seconds only and reduces the job completion times by up to 55% compared to state-of-the-art schedulers

    Sur l'efficacité des traitements Big Data sur les plateformes partagées à grandes échelle: gestion des entrées-sorties et des pannes

    No full text
    As of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance.En 2017 nous vivons dans un monde rĂ©gi par les donnĂ©es. Les applications d’analyse de donnĂ©es apportent des amĂ©liorations fondamentales dans de nombreux domaines tels que les sciences, la santĂ© et la sĂ©curitĂ©. Cela a stimulĂ© la croissance des volumes de donnĂ©es (le dĂ©luge du Big Data). Pour extraire des informations utiles Ă  partir de cette quantitĂ© Ă©norme d’informations, diffĂ©rents modĂšles de traitement des donnĂ©es ont Ă©mergĂ© tels que MapReduce, Hadoop, et Spark. Les traitements Big Data sont traditionnellement exĂ©cutĂ©s Ă  grande Ă©chelle (les systĂšmes HPC et les Clouds) pour tirer parti de leur puissance de calcul et de stockage. Habituellement, ces plateformes Ă  grande Ă©chelle sont utilisĂ©es simultanĂ©ment par plusieurs utilisateurs et de multiples applications afin d’optimiser l’utilisation des ressources. Bien qu’il y ait beaucoup d’avantages Ă  partager de ces plateformes, plusieurs problĂšmes sont soulevĂ©s dĂšs lors qu’un nombre important d’utilisateurs et d’applications les utilisent en mĂȘme temps, parmi lesquels la gestion des E/S et des dĂ©faillances sont les principales qui peuvent avoir un impact sur le traitement efficace des donnĂ©es.Nous nous concentrons tout d’abord sur les goulots d’étranglement liĂ©s aux performances des E/S pour les applications Big Data sur les systĂšmes HPC. Nous commençons par caractĂ©riser les performances des applications Big Data sur ces systĂšmes. Nous identifions les interfĂ©rences et la latence des E/S comme les principaux facteurs limitant les performances. Ensuite, nous nous intĂ©ressons de maniĂšre plus dĂ©taillĂ©e aux interfĂ©rences des E/S afin de mieux comprendre les causes principales de ce phĂ©nomĂšne. De plus, nous proposons un systĂšme de gestion des E/S pour rĂ©duire les dĂ©gradations de performance que les applications Big Data peuvent subir sur les systĂšmes HPC. Par ailleurs, nous introduisons des modĂšles d’interfĂ©rence pour les applications Big Data et HPC en fonction des rĂ©sultats que nous obtenons dans notre Ă©tude expĂ©rimentale concernant les causes des interfĂ©rences d’E/S. Enfin, nous exploitons ces modĂšles afin de minimiser l’impact des interfĂ©rences sur les performances des applications Big Data et HPC. DeuxiĂšmement, nous nous concentrons sur l’impact des dĂ©faillances sur la performance des applications Big Data en Ă©tudiant la gestion des pannes dans les clusters MapReduce partagĂ©s. Nous prĂ©sentons un ordonnanceur qui permet un recouvrement rapide des pannes, amĂ©liorant ainsi les performances des applications Big Data

    Sur l'efficacité des traitements Big Data sur les plateformes partagées à grandes échelle: gestion des entrées-sorties et des pannes

    No full text
    As of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance.En 2017 nous vivons dans un monde rĂ©gi par les donnĂ©es. Les applications d’analyse de donnĂ©es apportent des amĂ©liorations fondamentales dans de nombreux domaines tels que les sciences, la santĂ© et la sĂ©curitĂ©. Cela a stimulĂ© la croissance des volumes de donnĂ©es (le dĂ©luge du Big Data). Pour extraire des informations utiles Ă  partir de cette quantitĂ© Ă©norme d’informations, diffĂ©rents modĂšles de traitement des donnĂ©es ont Ă©mergĂ© tels que MapReduce, Hadoop, et Spark. Les traitements Big Data sont traditionnellement exĂ©cutĂ©s Ă  grande Ă©chelle (les systĂšmes HPC et les Clouds) pour tirer parti de leur puissance de calcul et de stockage. Habituellement, ces plateformes Ă  grande Ă©chelle sont utilisĂ©es simultanĂ©ment par plusieurs utilisateurs et de multiples applications afin d’optimiser l’utilisation des ressources. Bien qu’il y ait beaucoup d’avantages Ă  partager de ces plateformes, plusieurs problĂšmes sont soulevĂ©s dĂšs lors qu’un nombre important d’utilisateurs et d’applications les utilisent en mĂȘme temps, parmi lesquels la gestion des E/S et des dĂ©faillances sont les principales qui peuvent avoir un impact sur le traitement efficace des donnĂ©es.Nous nous concentrons tout d’abord sur les goulots d’étranglement liĂ©s aux performances des E/S pour les applications Big Data sur les systĂšmes HPC. Nous commençons par caractĂ©riser les performances des applications Big Data sur ces systĂšmes. Nous identifions les interfĂ©rences et la latence des E/S comme les principaux facteurs limitant les performances. Ensuite, nous nous intĂ©ressons de maniĂšre plus dĂ©taillĂ©e aux interfĂ©rences des E/S afin de mieux comprendre les causes principales de ce phĂ©nomĂšne. De plus, nous proposons un systĂšme de gestion des E/S pour rĂ©duire les dĂ©gradations de performance que les applications Big Data peuvent subir sur les systĂšmes HPC. Par ailleurs, nous introduisons des modĂšles d’interfĂ©rence pour les applications Big Data et HPC en fonction des rĂ©sultats que nous obtenons dans notre Ă©tude expĂ©rimentale concernant les causes des interfĂ©rences d’E/S. Enfin, nous exploitons ces modĂšles afin de minimiser l’impact des interfĂ©rences sur les performances des applications Big Data et HPC. DeuxiĂšmement, nous nous concentrons sur l’impact des dĂ©faillances sur la performance des applications Big Data en Ă©tudiant la gestion des pannes dans les clusters MapReduce partagĂ©s. Nous prĂ©sentons un ordonnanceur qui permet un recouvrement rapide des pannes, amĂ©liorant ainsi les performances des applications Big Data

    Preserving Fairness in Shared Hadoop Cluster: A Study on the Impact of (Non-) Preemptive Approaches

    Get PDF
    Recently, MapReduce and its open-source implementation Hadoop have emerged as prevalent tools for big data analysis in the cloud. Fair resource allocation in-between jobs and users is an important issue, especially in multi-tenant environments such as clouds. Thus several scheduling policies have been developed to preserve fairness in multi-tenant Hadoop clusters. At the core of these schedulers, simple (non-) preemptive approaches are employed to free resources for tasks belonging to jobs with less-share. For example, Hadoop Fair Scheduler is equipped with two approaches: wait and kill. While wait may introduce a serious violation in fairness, kill may result in a huge waste of resources. Yet, recently some works have introduced new preemption approaches (e.g., pause-resume) in shared Hadoop clusters. To this end, in this work, we closely examine three approaches including wait, kill and pause-resume when Hadoop Fair Scheduler is employed for ensuring fair execution between multiple concurrent jobs. We perform extensive experiments to assess the impact of these approaches on performance and resource utilization while ensuring fairness. Our experimental results bring out the differences between these approaches and illustrate that these approaches are only sub-optimal for different workloads and cluster configurations: the efficiency of achieving fairness and the overall performance varies with the workload composition, resource availability and the cost of the adopted preemption technique.RĂ©cemment, le paradigme MapReduce et son implĂ©mentation open-source Hadoop sont devenus des outils trĂšs populaires pour l’analyse dedonnĂ©es massives dans le Cloud. Le partage Ă©quitable des ressources entre les diffĂ©rentes tĂąches et utilisateurs est un problĂšme important, en particulier dans les architectures multi-tenant comme le Cloud. De nombreuses stratĂ©gies d’ordonnancement ont donc Ă©tĂ© dĂ©veloppĂ©es pour prĂ©server l’équitĂ© dans les cluster Hadoop multi-tenant. Au cƓur de ces ordonnanceurs, des approches simples et non-prĂ©emptives sont utilisĂ©es pour libĂ©rer des ressources pour des tĂąches appartenant Ă  des utilisateurs en ayant eu jusque-lĂ  une part plus faible. Par exemple, Hadoop Fair Scheduler possĂšde deux approches : "attendre" et "tuer". Si "attendre" peut causer des sĂ©rieuses ruptures d’équitĂ©, "tuer" peut aussi entraĂźner un important gaspillage des ressources. Cependant, certains travaux rĂ©cents ont introduit des techniques prĂ©emptives (c’est-Ă -dire "arrĂȘter-reprendre") dans les clusters Hadoop partagĂ©s. Dans ce travail, nous examinons prĂ©cisĂ©ment trois approches, incluant "attendre", "tuer" et "arrĂȘter-reprendre",lorsque Hadoop Fair Scheduler est utilisĂ© pour assurer une rĂ©partition Ă©quitable des ressources lors de l’exĂ©cution de plusieurs groupes de tĂąches concurrents. Nous avons menĂ© des expĂ©riences Ă©tendues pour Ă©valuer l’impact de ces approches sur les performances et l’utilisation des ressources tout en garantissant leur partage Ă©quitable. Les rĂ©sultats de nos expĂ©riences mettent en Ă©vidence les diffĂ©rences entre ces stratĂ©gies et montrent que chacune est sous-optimale pour une partie des workloads et des configurations : la capacitĂ© Ă  garantir l’équitĂ© et les performances globales varient en fonction de la composition des tĂąches, des ressources disponibles et du coĂ»t des techniques prĂ©emptives
    corecore